Model#
The model
module provides classes for different language models used in the LMCSC (Language Model-based Corrector with Semantic Constraints) system. It includes a base class LMModel
and several specific model implementations.
Key Components#
LMModel
: Base class for language models.QwenModel
: Class for Qwen language models.LlamaModel
: Class for Llama language models.BaichuanModel
: Class for Baichuan language models.InternLM2Model
: Class for InternLM2 language models.UerModel
: Class for UER language models.AutoLMModel
: Factory class for automatically selecting and instantiating the appropriate language model.
LMModel#
The LMModel
class serves as the base class for all language models in the LMCSC system. It provides common functionality and interfaces for working with different types of language models.
Key Features:#
Initialization with pre-trained models
Tokenization and vocabulary management
Beam search preparation and output processing
Model parameter counting
Model-Specific Classes#
The module provides specific implementations for various language models:
QwenModel
: Optimized for Qwen models, with support for FlashAttention2.LlamaModel
: Tailored for Llama models, with specific tokenization and padding strategies.BaichuanModel
: Designed for Baichuan models, with custom token handling.InternLM2Model
: Specialized for InternLM2 models.UerModel
: Adapted for UER models, with specific output processing.
Each of these classes inherits from LMModel
and overrides certain methods to accommodate the unique characteristics of their respective model architectures.
AutoLMModel#
The AutoLMModel
class provides a convenient way to instantiate the appropriate language model based on the model name or path. It automatically selects the correct model class and initializes it with the given parameters.
Example:
from lmcsc.model import AutoLMModel
# Create a Qwen model
qwen_model = AutoLMModel.from_pretrained("qwen-7b")
# Create a Llama model
llama_model = AutoLMModel.from_pretrained("llama-7b")
# Create a Baichuan model
baichuan_model = AutoLMModel.from_pretrained("Baichuan2-7B-Base")
This factory pattern allows for easy integration of new model types and simplifies the process of working with different language models within the LMCSC system.
API Documentation#
- class lmcsc.model.LMModel(model: str, attn_implementation: str | None = None, *args, **kwargs)[source]#
Bases:
object
A base class for language models.
- Parameters:
model (str) – The name or path of the pre-trained model.
attn_implementation (str, optional) – The attention implementation to use. Defaults to None.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- model_name#
The name of the model.
- Type:
str
- model#
The loaded language model.
- Type:
AutoModelForCausalLM
- tokenizer#
The tokenizer for the model.
- Type:
AutoTokenizer
- vocab#
The vocabulary of the model.
- Type:
dict
- is_byte_level_tokenize#
Whether the tokenization is byte-level.
- Type:
bool
- set_decoder_start_token_id()[source]#
Sets the decoder start token ID.
- Raises:
NotImplementedError – This method should be implemented by subclasses.
- set_vocab_size()[source]#
Sets the vocabulary size.
- Raises:
NotImplementedError – This method should be implemented by subclasses.
- set_convert_ids_to_tokens()[source]#
Sets the convert_ids_to_tokens function.
- Raises:
NotImplementedError – This method should be implemented by subclasses.
- decorate_model_instance()[source]#
Decorates the model instance with additional attributes and settings.
- get_model_kwargs()[source]#
Gets the model-specific keyword arguments.
- Raises:
NotImplementedError – This method should be implemented by subclasses.
- prepare_beam_search_inputs(src: List[str], contexts: List[str] | None = None, prompt_split: str = '\n', n_beam: int = 8, n_beam_hyps_to_keep: int = 1)[source]#
Prepares inputs for beam search.
- Parameters:
src (List[str]) – The source sentences.
contexts (List[str], optional) – The context for each source sentence. Defaults to None.
prompt_split (str, optional) – The prompt split token. Defaults to “n”.
n_beam (int, optional) – The number of beams. Defaults to 8.
n_beam_hyps_to_keep (int, optional) – The number of beam hypotheses to keep. Defaults to 1.
- Returns:
A tuple containing model_kwargs, context_input_ids, context_attention_mask, and beam_scorer.
- Return type:
tuple
- prepare_prompted_inputs(src: List[str])[source]#
Prepares inputs for beam search.
- Parameters:
src (List[str]) – The source sentences.
contexts (List[str], optional) – The context for each source sentence. Defaults to None.
prompt_split (str, optional) – The prompt split token. Defaults to “n”.
n_beam (int, optional) – The number of beams. Defaults to 8.
n_beam_hyps_to_keep (int, optional) – The number of beam hypotheses to keep. Defaults to 1.
- Returns:
A tuple containing model_kwargs, context_input_ids, context_attention_mask, and beam_scorer.
- Return type:
tuple
- process_generated_outputs(outputs, contexts: List[str] | None = None, prompt_split: str = '\n', n_beam_hyps_to_keep: int = 1, need_decode: bool = True)[source]#
Processes the generated outputs.
- Parameters:
outputs – The generated outputs.
contexts (List[str], optional) – The context for each output. Defaults to None.
prompt_split (str, optional) – The prompt split token. Defaults to “n”.
n_beam_hyps_to_keep (int, optional) – The number of beam hypotheses to keep. Defaults to 1.
need_decode (bool, optional) – Whether to decode the outputs. Defaults to True.
- Returns:
The processed predictions.
- Return type:
List[List[str]]
- class lmcsc.model.ChatLMModel(model: str, attn_implementation: str | None = None, *args, **kwargs)[source]#
Bases:
LMModel
- prepare_prompted_inputs(src: List[str])[source]#
Prepares inputs for beam search.
- Parameters:
src (List[str]) – The source sentences.
contexts (List[str], optional) – The context for each source sentence. Defaults to None.
prompt_split (str, optional) – The prompt split token. Defaults to “n”.
n_beam (int, optional) – The number of beams. Defaults to 8.
n_beam_hyps_to_keep (int, optional) – The number of beam hypotheses to keep. Defaults to 1.
- Returns:
A tuple containing model_kwargs, context_input_ids, context_attention_mask, and beam_scorer.
- Return type:
tuple
- class lmcsc.model.QwenModel(model, *args, **kwargs)[source]#
Bases:
LMModel
A class for Qwen language models.
- Parameters:
model (str) – The name or path of the pre-trained Qwen model.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- get_model_kwargs()[source]#
Gets the model-specific keyword arguments for Qwen models. Different from other models, Qwen uses <|endoftext|> as both eos_token and pad_token. Qwen uses DynamicCache for past_key_values.
- Returns:
A dictionary of keyword arguments.
- Return type:
dict
- class lmcsc.model.ChatQwenModel(model, *args, **kwargs)[source]#
Bases:
ChatLMModel
,QwenModel
- class lmcsc.model.LlamaModel(model, *args, **kwargs)[source]#
Bases:
LMModel
A class for Llama language models.
- Parameters:
model (str) – The name or path of the pre-trained Llama model.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- prepare_beam_search_inputs(src: List[str], contexts: List[str] | None = None, prompt_split: str = '\n', n_beam: int = 8, n_beam_hyps_to_keep: int = 1)[source]#
Prepares inputs for beam search for Llama models.
- Parameters:
src (List[str]) – The source sentences.
contexts (List[str], optional) – The context for each source sentence. Defaults to None.
prompt_split (str, optional) – The prompt split token. Defaults to “n”.
n_beam (int, optional) – The number of beams. Defaults to 8.
n_beam_hyps_to_keep (int, optional) – The number of beam hypotheses to keep. Defaults to 1.
- Returns:
A tuple containing model_kwargs, context_input_ids, context_attention_mask, and beam_scorer.
- Return type:
tuple
- class lmcsc.model.ChatLlamaModel(model, *args, **kwargs)[source]#
Bases:
ChatLMModel
,LlamaModel
- class lmcsc.model.BaichuanModel(model, *args, **kwargs)[source]#
Bases:
LMModel
A class for Baichuan language models.
- Parameters:
model (str) – The name or path of the pre-trained Baichuan model.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- class lmcsc.model.ChatBaichuanModel(model, *args, **kwargs)[source]#
Bases:
ChatLMModel
,BaichuanModel
- class lmcsc.model.InternLM2Model(model, *args, **kwargs)[source]#
Bases:
LMModel
A class for InternLM2 language models.
- Parameters:
model (str) – The name or path of the pre-trained InternLM2 model.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- class lmcsc.model.ChatInternLM2Model(model, *args, **kwargs)[source]#
Bases:
ChatLMModel
,InternLM2Model
- class lmcsc.model.UerModel(model, *args, **kwargs)[source]#
Bases:
LMModel
A class for UER language models.
- Parameters:
model (str) – The name or path of the pre-trained UER model.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- get_model_kwargs()[source]#
Gets the model-specific keyword arguments for UER models.
- Returns:
A dictionary of keyword arguments.
- Return type:
dict
- process_generated_outputs(outputs, contexts: List[str] | None = None, prompt_split: str = '\n', n_beam_hyps_to_keep: int = 1)[source]#
Processes the generated outputs for UER models.
- Parameters:
outputs – The generated outputs.
contexts (List[str], optional) – The context for each output. Defaults to None.
prompt_split (str, optional) – The prompt split token. Defaults to “n”.
n_beam_hyps_to_keep (int, optional) – The number of beam hypotheses to keep. Defaults to 1.
- Returns:
The processed predictions.
- Return type:
List[List[str]]
- class lmcsc.model.ChatUerModel(model, *args, **kwargs)[source]#
Bases:
ChatLMModel
,UerModel
- class lmcsc.model.AutoLMModel[source]#
Bases:
object
A factory class for automatically selecting and instantiating the appropriate language model.
This class provides a static method to create instances of specific language model classes based on the model name or path provided.
- static from_pretrained(model: str, use_chat_prompted_model: bool = False, *args, **kwargs)[source]#
Creates and returns an instance of the appropriate language model class based on the model name.
- Parameters:
model (str) – The name or path of the pre-trained model.
*args – Variable length argument list to be passed to the model constructor.
**kwargs – Arbitrary keyword arguments to be passed to the model constructor.
- Returns:
An instance of the appropriate language model class.
- Return type:
- Raises:
ValueError – If an unsupported model type is specified.